Fuzzy-Rough Simultaneous Attribute Selection and Feature Extraction Algorithm

نویسندگان

  • Pradipta Maji
  • Partha Garai
چکیده

Among the huge number of attributes or features present in real-life data sets, only a small fraction of them are effective to represent the data set accurately. Prior to analysis of the data set, selecting or extracting relevant and significant features is an important preprocessing step used for pattern recognition, data mining, and machine learning. In this regard, a novel dimensionality reduction method, based on fuzzy-rough sets, that simultaneously selects attributes and extracts features using the concept of feature significance is presented. The method is based on maximizing both the relevance and significance of the reduced feature set, whereby redundancy therein is removed. This paper also presents classical and neighborhood rough sets for computing the relevance and significance of the feature set and compares their performances with that of fuzzy-rough sets based on the predictive accuracy of nearest neighbor rule, support vector machine, and decision tree. An important finding is that the proposed dimensionality reduction method based on fuzzy-rough sets is shown to be more effective for generating a relevant and significant feature subset. The effectiveness of the proposed fuzzy-rough-set-based dimensionality reduction method, along with a comparison with existing attribute selection and feature extraction methods, is demonstrated on real-life data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

Fuzzy-rough Information Gain Ratio Approach to Filter-wrapper Feature Selection

Feature selection for various applications has been carried out for many years in many different research areas. However, there is a trade-off between finding feature subsets with minimum length and increasing the classification accuracy. In this paper, a filter-wrapper feature selection approach based on fuzzy-rough gain ratio is proposed to tackle this problem. As a search strategy, a modifie...

متن کامل

Consistency-preserving attribute reduction in fuzzy rough set framework

Attribute reduction (feature selection) has become an important challenge in areas of pattern recognition, machine learning, data mining and knowledge discovery. Based on attribute reduction, one can extract fuzzy decision rules from a fuzzy decision table. As consistency is one of several criteria for evaluating the decision performance of a decision-rule set, in this paper, we devote to prese...

متن کامل

A fuzzy rough set approach for incremental feature selection on hybrid information systems

In real-applications, there may exist many kinds of data (e.g., boolean, categorical, real-valued and set-valued data) and missing data in an information system which is called as a Hybrid Information System (HIS). A new Hybrid Distance (HD) in HIS is developed based on the value difference metric, and a novel fuzzy rough set is constructed by combining the HD distance and the Gaussian kernel. ...

متن کامل

Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets

With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE transactions on cybernetics

دوره 43 4  شماره 

صفحات  -

تاریخ انتشار 2013